i(i[r@ m 

HUMAN NEUROSCIENCE 



METHODS ARTICLE 

published: 09 September 2014 
doi: 10.3389/fnhum. 2014. 00697 




Sequential sampling nnodel for multiattribute choice 
alternatives with random attention time and processing 
order 

Adele Diederich^* and Peter Oswald^ 

' Cognitive Psychology, School of Humanities and Social Sciences, Jacobs University, Bremen, Germany 

^ Mathematics, Ivlocleling, and Computing Center, School of Engineering and Science, Jacobs University, Bremen, Germany 



Edited by: 

Jose Antonio Diaz, Universidad de 
Granada, Spain 

Reviewed by: 

Chris Donkin, University of New 
South Wales, Australia 
Jose Antonio Diaz, Universidad de 
Granada, Spain 

"Correspondence: 

Adele Diedehch, Cognitive 
Psychology, School of Humanities 
and Social Sciences, Jacobs 
University, Campus Ring 1, 
Bremen 28759, Germany 
e-mail: a.diederich@ 
jacobs-university. de 



A sequential sampling model for multiattribute binary choice options, called multiattribute 
attention switching (MAAS) model, assumes a separate sampling process for each attribute. 
During the deliberation process attention switches from one attribute consideration to the 
next. The order in which attributes are considered as well for how long each attribute is 
considered — the attention time — influences the predicted choice probabilities and choice 
response times. Several probability distributions for the attention time with different 
variances are investigated. Depending on the time and order schedule the model predicts 
a rich choice probability/choice response time pattern including preference reversals and 
fast errors. Furthermore, the difference between finite and infinite decision horizons for 
the attribute considered last is investigated. For the former case the model predicts a 
probability po > 0 of not deciding within the available time. The underlying stochastic 
process for each attribute is an Ornstein-Uhlenbeck process approximated by a discrete 
birth-death process. All predictions are also true for the widely applied Wiener process. 

Keywords: sequential sampling, multiattribute, attention time, time schedule, order schedule, finite time horizon, 
Omstein-Uhlenbeck, Wiener 



1. INTRODUCTION 

Sequential sampling models are powerful models to account 
simultaneously for choice probabilities and choice response 
times. They have become the dominant approach to modeling 
decision processes in cognitive science. Their application includes 
a variety of psychological tasks from basic perceptual decision 
to complex preferential choice tasks. Early on they have been 
applied to identification and discrimination tasks (e.g., Edwards, 
1965; Laming, 1968; Pike, 1973; Link and Heath, 1975; Heath, 
1981; Ashby, 1983); memory retrieval (e.g.. Stone, 1960; Ratchff, 
1978; Van Zandt et al., 2000); and classification (e.g., general 
recognition theory, Ashby, 2000; exemplar-based random walk 
models of classification, Nosofsky and Palmeri, 1997) to account 
for speed-accuracy data. They have also been used for preferen- 
tial decision tasks (e.g., decision field theory (DFT), Busemeyer 
and Townsend, 1993; multiattribute dynamic decision model, 
Diederich, 1997; Diederich and Busemeyer, 1999) to account 
for choice response times and choice probabilities interpreted as 
preference strength; judgment and confidence ratings (Pleskac 
and Busemeyer, 2010); to account for selling prices, certainty 
equivalents, and preference reversal phenomena (Busemeyer and 
Goldstein, 1992; Johnson and Busemeyer, 2005). More recently, 
they have been applied to combining perceptional decision mak- 
ing and payoffs (Diederich and Busemeyer, 2006; Diederich, 2008; 
Rorie et al, 2010; Gao et ak, 2011). Furthermore, these mod- 
els have been closely linked to measures from neuroscience like 
multi-cell electrode recordings (e.g., Ditterich, 2006; Gold and 
Shadlen, 2007; Churchland et al, 2008). 



Sequential sampling models assume that (1) stimulus and 
choice alternative characteristics can be mapped onto a hypo- 
thetical numerical value representing the instantaneous level of 
evidence (activation, information, or preference — the wording 
often depends on the context), (2) some random fluctuation of 
this value over time occurs, (3) this evidence is accumulated 
over time, and (4) a final choice is made as soon as the evi- 
dence reaches a threshold. Therefore, sequential sampling can be 
described as a stochastic process. Two quantities are of foremost 
interest: ( 1 ) the probability that the process eventually reaches one 
of the thresholds or boundaries for the first time (the criterion to 
initiate a response), i.e., first passage probability; (2) the time it 
takes for the process to reach one of the boundaries for the first 
time, i.e., first passage time. The former quantity is related to the 
observed relative frequencies, the latter usually to the observed 
mean choice response times or the observed choice response time 
distribution. 

Two classes of sequential sampling models have been predom- 
inantly used in psychology: Random walk/diffusion models and 
accumulator/counter models. The former are typically applied 
to a binary choice task, so that evidence for one choice alterna- 
tive is at the same time evidence against the other. A decision 
is made as soon as the process reaches one of two preset crite- 
ria. In the latter, an accumulator/counter is established for each 
choice alternative separately, and evidence is accumulated in par- 
allel. A decision is made as soon as one counter wins the race 
to reach one preset criterion. The accumulators/counters may or 
may not be independent. In the following we focus on random 
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walk/ diffusion models. For a review of both diffusion models and 
counter models see Ratcliff and Smith (2004). 

To be more precise and to introduce notation, let X{t) denote 
the accumulation process. For a binary choice, say between choice 
options A and B (Figure 1), the models assume that the deci- 
sion process begins with an initial state of evidence X(0). This 
initial state may either favor option A (X{0) > 0) or option B 
{X(0) < 0) or may be neutral with respect to A or B (X (0) = 0). 
Upon presentation of the choice options, the decision maker 
sequentially samples information from the stimulus display over 
time, retrieves information from memory, or forms preferences, 
depending on the context. The small increments of evidence sam- 
pled at any moment in time are such that they either favor option 
A {dX(t) > 0) or option B {dX{t) < 0). The evidence is accu- 
mulated from one moment in time to the next by summing 
the current state with the new increment: X{t + h) ^ X{t) + 
fi (Xit), t)h + a iX(t), t) (W{t + h)- W(t)). Here, /x(x, t) is 
called the drift rate and describes the expected value of incre- 
ments per unit time. The factor a(x, t) in front of the incre- 
ments W(t + h) — W{t) of a standard Wiener process W{t) 
is called the diffusion rate, and relates to the variance of the 
increments. This process continues until the magnitude of the 
cumulative evidence exceeds a threshold criterion, 9. The pro- 
cess stops and option A is chosen as soon as the accumu- 
lated evidence reaches a criterion value for choosing A (here, 
X{t) = Oa > Q) or it stops and chooses option B as soon as the 
accumulated evidence reaches a criterion value for choosing B 
(here X{t) = Ob < 0). The probability of choosing A over B is 
determined by the accumulation process reaching the thresh- 
old for A before reaching the threshold for B. The criterion is 
assumed to be set by the decision maker prior to the decision 
task. 



The Wiener process with drift, lately called drift- diffusion 
model in the psychological literature (Bogacz et al., 2006), is the 
most widely applied model. Different versions reflect additional 
assumptions for specific psychological domains. Ratcliff (1978) 
proposed a diffusion model for memory retrieval that is used for 
various psychological decision tasks. It is based on the work by 
Laming (1968) and Link and Heath (1975) and assumes variabil- 
ity in the starting point (i.e., X{Q) follows a uniform distribution), 
and the drift rate /x = /x(t) of the Wiener process is normally 
distributed (cf Laming). The residual time, i.e., the time other 
than the decision time, such as stimulus encoding and motor 
response, is assumed to be uniformly distributed and added to 
the decision time, i.e., response time equals the decision time 
plus a residual (non-decision) time. For a recent overview with 
applications see Voss et al. (2013). Other approaches include 
the Ornstein-Uhlenbeck model that linearly accumulates evi- 
dence with decay (Busemeyer and Townsend, 1993; Diederich, 
1997), and the leaky competing accumulator model (Usher and 
McClelland, 2001) that non-linearly accumulates evidence with 
decay. 

Common to almost all of these approaches is the assump- 
tion that a single integrated source of evidence generates the 
evidence during the deliberation process leading to a decision. 
In particular, the integrated source may be based on multiple 
features or attributes, but all of these features or attributes are 
assumed to be combined and integrated into a single source of 
evidence, and this single source is used throughout the deci- 
sion process until a final decision is reached. Diederich (e.g., 
Diederich, 1995, 1997, 2003, 2008), however, assumed a separate 
process for each attribute'. The decision maker switches atten- 
tion from one attribute to the next during the time course of 
one trial. For instance, in a crossmodal task (visual, auditory, tac- 
tile), Diederich (1995) assumed a serial processing controlled by 
stimulus input at given stimulus onset asynchronies (SOA). That 
is, the order of attributes, here a light, followed by a tone, fol- 
lowed by a tactile vibration, as well as the point in time when 
a new attribute was added, here the tone presented at fi (fi 
ms after the light onset) and the tactile vibration at t2 [ti ms 
after the light onset) was determined externally by the experi- 
mental setup. In the following we will call an attention switch 
at predetermined, fixed times, and predefined order attributes, 
a deterministic time and order schedule. Often, however, neither 
the processing order of attributes nor the point in time when 
the decision maker switches attention from one attribute to the 
next are known or can be inferred from the experimental setup. 
For those cases, Diederich (1997) proposed a specific model in 
which attention switches from one attribute to the next with some 
probability. This is an instance of a random time and order sched- 
ule which will be investigated more systematically in the present 
study. 



'The notion of attributes is defined here in a broad sense. For exam- 
ple, it includes dimensions such as color and size of visual target; 
amplitude and frequency of a tone; different modalities in a crossmodal 
task; payoff information and perceptual information; attitudinal evidence 
and perceptual evidence; prize and quality of a consumer product and 
more. 



Threshold 6^ for A 




Threshold e„ for B 



FIGURE 1 I The trajectories symbolize the accumulation process for 
three different trials. In one trial (red) the process is absorbed at the 
boundary for making an A response. In another trial (blue) the process is 
absorbed at the boundary for making a B response. For the third trial (black) 
the accumulation process still evolves and no response is yet initiated. 
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The purpose of this paper is to present a unified treatment 
of sequential sampling models for both deterministic and ran- 
dom time and order schedules. To do so we start with deriving 
expressions for mean choice response times and choice proba- 
bilities for a deterministic time and order schedule before we 
show how they extend to random time and order schedules, 
including Poisson, binomial, geometric, and uniform distribu- 
tions for the attention time devoted to each attribute in the 
sequence before attention switches to the next randomly or deter- 
ministically chosen attribute. We wiU provide first numerical 
evidence on the influence of various properties of a schedule 
on the predictions for mean choice response times and choice 
probabilities. 

2. PRELIMINARIES 

The model applies to any finite number of attributes that the 
decision maker may consider, i.e., k = I, . . . , K. For convenience 
we first describe the process for one attribute. As underlying 
information process for each attribute we assume an Ornstein- 
Uhlenbeck process X{t) defined by 



dX{t) = iSk-ykX{t)) dt + akdW{t), 



(1) 



where W(t) is a standard Wiener process. The parameters Sk, 
Yk, and ak are characteristics of the fc-th attribute. The attribute 
characteristics may affect the quality of the extracted evidence for 
choosing A over B and this quality of evidence determines the 
drift rate 8k- That is, the better an attribute discriminates between 
A and 5, the larger is 8k- The parameter yjt which induces a change 
of the drift rate depending on the current state in the state space is 
often connected to memory processes (e.g., primacy and recency 
effects), conflict situations (e.g., approach-avoidance), or similar- 
ities between choice alternatives. Thus, together the effective drift 
~ Yk^(t) determines the direction and the velocity of the pro- 
cess when considering the /c-th attribute at time f. Note that by 
setting Yk to 0 results in a Wiener process with drift. That is, all the 
analysis we perform in the following is also valid for the Wiener 
process with drift. The diffusion coefficient Ok indicates the vari- 
ance of the increments of the process, for simplicity, we will set 
Ok = o for all k. 



The attribute-related matrices Pk, k ■■ 
their canonical form by 



Pk 



0 0 
0 0 



1, . . . , K, are given in 
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Qk 



0 
0 



0 £)<^' 

Pm - l,m 



-P22' 

P32 P33 
0 Pi? 



0 0 
0 0 
0 0 









rm — 


3,m — 


2 






rm — 


2,m — 


2 






rm — 


l,m — 


2 



m - 
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(2) 



where 



ptl, = Ul-iSk-YkrA)^ 



Pu+i = 2 • ^ + (h-YkiA) 



a 



for 



2, . . . , m — 1 (here, the index corresponds to the state 
i — I — ms). As A — ^ 0 (or, equivalently, r — ^ 0), the decision 
probabilities and mean choice response times obtained from the 
Markov chain model converge to the values obtained from the 
underlying continuous process X(t). The identity matrix I cor- 
responds to the two absorbing states (—ms and ota) associated 
with the two decision thresholds, one for each choice alternative; 
the matrix Qk contains the transient probabilities, corresponding 
to the updating evidence process, and the matrix Rk contains the 
one-step transition probabilities from the transient to the absorb- 
ing states. In particular, the first column vector of the matrix Rk 
(denoted by-RB^^:) contains the transient probabilities for reaching 
alternative B, while the second RA,k contains the ones for alter- 
native A. For details and derivations see Diederich (1997) and 
Diederich and Busemeyer (2003). 



2.1. MATRIX APPROACH 

Stochastic processes such as the above X{t) can be approximated 
by a discrete time, finite state space Markov chain. We use the 
matrix approach since it is simple to implement, sufficient in 
determining the entities of interest, i.e., choice probabilities and 
choice response times, and flexible to account for non-stationary 
and non-linear properties one wishes to include for the decision 
making process in the future. The continuous state space [9b, Oa] 
of the piecewise Ornstein-Uhlenbeck process X(f) is replaced by a 
finite state space S = {—ms, . . . , rriA} with m = niA + niB + i 
states. The diffusion process {X{t), f > 0} is approximated by a 
discrete random walk {X(n), « > 0} with values in S such that 
X{nr) ~ A ■ X{n) and Oa ~ m^iA and 9b ~ —niBA, where A is 
the step size of change in evidence. To achieve convergence in the 
limit, the discretization parameters (A for state space, and r for 
time) are tied to each other by the relation A = a^. 



2.2. TIME AND ORDER SCHEDULE 

For K attributes, each one to be considered for some specific 
time in some specific order it is convenient to introduce a formal 
schedule of both time and order. A finite time and order schedule 
consists of a set of L consecutive time intervals {[f;_ 1 , f;]}/=i,...,i 
and the attribute sequence {fc; e {1, . . . , K}]i= 1 ^ which spec- 
ifies that during the time interval [t;_i, f/] the A:;-th attribute 
is considered. At switching time f;, / =!,.. .,L— 1, attention 
switches from attribute fc; to attribute Depending on the 

situation, the final time ti maybe set finite (then the decision pro- 
cess may also finish without deciding for one of the alternatives) 
or infinite. Consequently, the process X(t) determined by such a 
schedule is a piecewise Ornstein-Uhlenbeck process, defined over 
a finite partition fo = 0 < fi < • • • < fi - 1 < fi < +00 of the 
time interval [0, ti\, where for f e [f;_ 1, f;] the process is deter- 
mined by (1) with k = ki. Figure 2 shows an example with three 



Frontiers in Human Neuroscience 



www.frontiersin.org 



September 2014 | Volume 8 | Article 697 | 3 



Diederich and Oswald 



Multiattribute attention switching model 



'2 h 



8 > 0 
1 


82 <0 


8^ >0 


83>0 











B 



FIGURE 2 I A piecewise Ornstein-Uhlenbeck process with three 
different attributes. The attribute order is (1 , 2, 1 , 3), attribute 1 is 
considered twice in the sequence of attribute consideration. Switching 
attention from one attribute to the next occurs at fixed times fi , f2, and 13. 
The trajectories reflect the accumulation process for two different trials. 
The black solid lines indicate the effective drift of the process. 



different attributes (iC = 3) and a deterministic time and order 
schedule of length L = A with switching times f; independent 
of the trajectories, and attribute order (1, 2, 1, 3), i.e., ki = 1, 
/c2 = 2, /c3 = 1, /c4 = 3 (note that the first attribute is reconsidered 
once). 

For fixed A resp. r, the m x m transition proba- 
bility matrix P„ containing the transition probabilities 
pii' := P(X„-|-i = i'\X„ = i) for the n-th step of the discrete- 
time random walk depends on the currently considered attribute 
defined by the time and order schedule, i.e., we set P„ = Pj^^ if 
n = ni_ 1, ...,«; — 1, where mq = 0, tm; ~ f; for / = 1, . . . , L (if 
fj^ = 00, we formally set = 00). 

3. CHOICE PROBABILITIES AND MEAN CHOICE RESPONSE 
TIMES 

In this section we derive the choice probabilities and mean choice 
response times for various time and order schedules. For sim- 
plicity we assume an unbiased process, i.e., with X{0) = 0 and 
symmetric decision thresholds , i.e., 9a = —Ob- Since the diffii- 
sion coefficient is a scaling parameter it will be set to cr = 1 for all 
attributes throughout. We start with the deterministic time and 
order schedule. 

3.1. DETERMINISTIC TIME AND ORDER SCHEDULE 

The evidence accumulation process for attribute ki, which is con- 
sidered first, evolves until time fi when the second attribute ^2 
comes into consideration, triggering a change in the accumula- 
tion process. This attribute in turn is considered until time f2 
when a third attribute ks is considered and so forth until a deci- 
sion is initiated (or ti is reached). Let the random variables Ta 
and Tb denote the finite time when the process reaches a deci- 
sion threshold 9a or —9b, stops, and a decision response for A 



or B is initiated. With the switching times f; replaced by integers 
n/ ~ f;/r, the choice probability Pr[chooseA] = Pr(TA < 00) is 
then approximated by the value pA obtained from the discrete 
random walk model as 



Pr{TA < 00) ^Pa := Z' Q[; '«A.)c, 
i= 1 

"2 



i=«i + 1 



! = m - 1 + 1 



-(ni- 1 + 1)1 



(3) 



where Z is the probability distribution for the initial state X{0). 
For instance, for an unbiased process, Z would be a coordinate 
vector with probability 1 at state 0 halfway between the deci- 
sion thresholds. The remaining vectors and matrices are those 
defined in (2). The evidence accumulation process for a succes- 
sive attribute starts with the final evidence state of the previous 
attribute. Note that Z'Q"' to Z'Q"' . . . QI'"'^"^-^ are defec- 
tive distributions, i.e., the entries of these vectors do not sum 
up to 1, for the states of the random walk at discrete times 
ni, . . . , - 1. Further note that the stochastic process is time 
homogeneous within each time interval [0, fi) to [fi - 1, fi] but 
non-homogeneous across [0, ti] (see Diederich, 1992, 1995). 

Similarly, the mean response time for choosing alternative A is 
approximated as 



E[Ta I chooseA] ~ ETa 



Pa 



! = lll + 1 



i = ni_l +1 



RA,ki 



(4) 



The probability and the mean response time for choosing alter- 
native B can be determined similarly. Note that pg := 1 — (pA + 
Pb), the probability of not making a decision until the final time 
ti, is strictly positive if ti < 00. As shown in Diederich (1997), 
these formulas can be further compactified. We will do this below 
for the general case of deterministic and random schedules by 
deriving an efficient recursion for their evaluation. 

3.2. RANDOM TIME AND ORDER SCHEDULE 

The above derivation of formulas for choice probabilities and 
mean response times for a deterministic time and order schedule 
have counterparts for random schedules which we describe next 
in three steps. 



3.2.1. Random order schedule 

For generating the attribute order = 



we consider 



stochastic K x K matrices D''' such that > 0 describes the 
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probability with which attention switches from the fc'-th attribute 
to the fc-th attribute at switching time f; ~ r«;, / = I, . . . , L — 1. 
Normally, = 0 would be assumed, to avoid a no switching 
situation. For two attributes K = 2, we must then have dfl = 
d^'^' = 0, d^^l = cifi = 1' ^nd the attribute sequence is either 
(1, 2, 1, 2, . . . ) or (2, 1, 2, 1, . . . ), depending on whether ki = I 
or k\ = 2. For three attributes and L = 3, choosing 





0 1/2 1/2" 




" 0 10" 




1/2 0 1/2 


, d(2) = 


1 0 0 




_ 1/2 1/2 0 _ 




_3/4 1/4 0_ 



would for ki = I result in order sequences (1,2, 1), (1, 3, 1), 
(1, 3, 2) with probability 1/2, 3/8, 1/8, respectively. The above 
matrix D'^^ models the situation when no preference or bias for 
considering attributes can be asserted. 

3.2.2. Random time schedule 

We assume that the number of discrete time steps during which 
attention is paid to the fc-th attribute is a discrete random variable 
denoted by Tat with given distribution. In principle, this distribu- 
tion may change its type and may have different parameters, such 
as expected value, depending on the attribute and the attribute 
order This can be used to model time pressure and 

other temporal effects. However, often we assume one and the 
same distribution type for attention times across all attributes, 
and allow for different parameters only. 

For instance, the geometric distribution (as implicitly consid- 
ered in Diederich, 1997) is given by 

Pr{Tat = n) = il-rr-^r, «=1,2, 



create the attribute sequence {A:/};=2....,i using a non-stationary 
Markov chain model with transition probability matrices D''' , / = 
1,...,L— l.Ina second step, for each / = 1, . . . , L, the attention 
time r^f' = rii — rii^i is created by the discrete random vari- 
able responsible for the attention time paid to the fc/-th attribute, 
choices are independent for the different I. Consequently, f; — 
f;_ 1 ~ tT^f' is the real attention time paid to the /c;-th attribute. 
We note that semi-random schedules, where the sequence {fc/} is 
given deterministically, and only the T^t are determined as in the 
second step outlined above, are covered if we choose the D*'' such 

that4'^ = 1- 

To understand the recursive computation of choice probabil- 
ities and mean response times in this more general case, we first 
consider the special cases L = 1,2, and illustrate the derivation 
on some distribution types of the random variable Tat generating 
attention times by providing concrete formulas. In general, the 
distribution for Tat is given by its probability mass distribution 
(pdf) and cumulative distribution function (cdf) 

Pr{Tat = n)= p„,k, (5) 

n 

PriTat < «) = fn.k ■■= J2p''k' « = 0, 1, . . . . 

i = 0 

We start with L = 1, and will drop the index / from the notation 
introduced in the previous subsection. Since the probability of 
choosing alternative A at the i-th step is given by Z'Qj,^ '^A./t; 
i = 1, . . . , Tat, and Tat is a random variable distributed according 
to (5) we get 



and characterized by a single parameter r > 0, with expecta- 
tion 1/r and variance (1 — r)/r^, and the uniform distribution 
is defined as 



PriTat = n) 



1 



2M + 1 



n = N - M, 



.N + M, 



with parameters N and M = 0,1,...,N'— 1 and expectation N 
and variance M(M -|- l)/3. Details for other tested distributions 
(Poisson with parameter k > 0, and binomial distributions with 
parameters n and p) ate omitted. For comparable expectation val- 
ues E{Tat) (i.e., for parameter choices l/r~N~A~ np), the 
geometric distribution has much larger variance than the Poisson, 
binomial and uniform distribution with M ~ Vn (the latter are 
very close to each other). Figures shows the pdf and cdf for 
different Tat distributions with fixed mean value E(Tat) = 300. 
The two uniform distributions are with M = 150 = N/2 and 
M = 299 = N — 1. Varying the parameter M of the uniform dis- 
tribution allows us to produce intermediate results between the 
deterministic and geometric distribution cases as shown in the 
following. 

3.2.3. Constructing random time and order schedules 

We create a random time and order schedule of length L in two 
steps: First, given an initial distribution of /ci e {1, . . . , iC}, we 



pA,k = J2p"-k^' {12 ' j ^A.k 

' oo / oo \ 

_i= I \n = i I 

oo 



= z' 



RA,k 



Z' 



RA,k- 



A similar formula holds for pB,k- To avoid repetition, introduce 
the row vector pAB,*: := [pB,k, pA,k], then 



pAB.k = Z'Vk, Vk 



Rk. 



(6) 



The 2 X (m — 2) matrix Vk depends on the attribute and its 
parameters via Qk, Rky and on the chosen attention time distribu- 
tion and the cdf (f„^k)- For the discussed concrete attention time 
distributions these matrices may be precomputed, in some cases 
closed-form expressions can be found, e.g., for the geometric 
distribution with parameter r = rj: we have 
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Attention time (T^^): Probability mass distribution 




0 100 



Attention time (T Cumulative distribution function 

^ at' 




FIGURE 3 I Probability mass distributions (A) and cumulative 
distribution functions (B) for commonly used attention time 
distributions. All distributions have expected value 300. The uniform 



distributions with N = 300 and M -- 
and with W = 300 and M = N - T -- 
the geometric distribution. 



/V/2 = 150 are labeled as Unif.1 
299 as Unif.2. Geom. represents 



oo / oo 



. = oV = . + i / 

CO 

= ^(1 - VkYQi^Rk =(/-(!- rk)Qky'Rk. 

i = 0 

Next we discuss choice probabilities for the case L = 2, assum- 
ing for simplicity that the attention time distribution is the same 
for all attributes. To save on indices, denote ki = k',k2 = k, and 
D'^^ = D (this matrix is responsible for the random choice of k 
given any k'). Then the decision probability vector pAB,k',k for 
reaching alternatives B or A in with attribute order {k', k) has 
two parts: the probabilities of having decided on while still con- 
sidering the A:'-th attribute (i.e., Ta/t < T^j, where T'^^ is the ^here 
randomly generated attention time for the first attribute k ' ) plus 
the probabilities that vT'^^ < Ta/t < T'^^ + Tat, where Tat is the 
randomly (and independently) generated attention time for the 
second attribute k. On top of this, k itself is randomly chosen 
according to the entries in the fc '-th row of D. Thus, for each fixed 
ki = k' and «i = T^^ according to (6) probabilities for reaching 
a decision after mi are given by 



Thus, for L = 2, the choice probabilities (under the assumption 
that ki = k' is fixed) can be obtained as 



lpB,pA]h=k' = Z'Vk' + J^Pn-k'Z'Ql, '^k'kV^ 
n>0 \k=l I 

\">0 j \m=\ I 



k' = l,.. 



,K, 



Bk=J2p„.kQl k=l,...,K, 



(7) 



n>0 



k=l \k=\ / 



T',,ki = k' 



11 = Jot.*:! 



are (m — 2) x (m — 2) matrices depending on the attribute and 
attention time distribution type. For example, for the geomet- 
ric distribution this simplifies to = rkQkU ~ (1 ~ ''k)Qk)^^, 
closed form expressions are available for Poisson, binomial, and 
uniform distributions as well. 

For arbitrary L, it is more convenient to write the result- 
ing recursion in terms of block-matrix-vector operations. 
Denote by 
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Z the .fC X 1 array with each entry equal to the initial 

distribution Z (and think of Z' as its 

transpose, a 1 x .fC array with entries Z'), 
B the K X K diagonal array with the Bjt on the 

diagonal (similarly for C defined later), 
I the K X K diagonal array, with identity matrices I of 

the appropriate size on the diagonal, 
V the X 1 array with the as entries (similarly for 

W defined later), and 
Pab the K X 2 matrix, whose rows are the choice 

probabilities [pA, pB]\ki=k defined before 

in the case L = 2. 
Then the above result for L = 2 can be compactly written as 



PAB = Z'(I + BD)V. 



(8) 



Note that the product BD of the array B with the matrix D is inter- 
preted as the K X K array with di^'kBk' as entry in row k' and 
column k. Moreover, by iterating (8), one arrives at the formula 
for arbitrary L: 



PAB = Z'(l + BD(i') . . . (l + BDC- V. 



(9) 



Formulas for mean response times can be derived similarly. 
Indeed, for L = 1, denote by ETA^k the mean response time 
for reaching alternative A when considering the fc-th attribute 
for a random time Tat distributed according to (5). Then 
ETA,k ~ t etA,k/pA.k, where 



:Z' 



Z' 



etA.k = Yl P"-k I I] + ) ^A,k 

n=\ \i=0 I 

oo / oo \ 

Y.\ E Pn.kU^m, 

J = Q \n = i-\-l / 
oo 

- f,,k){i + m'k RA,k- 



<A,k 



(10) 



Similarly for ETB,k and etB,k- Thus, similar to (6), we can write 
etAB,k ■ = [etB,k, etA.k] = Z'Wk, (11) 



f](i -kk)ii + m'k 



.1 = 0 



Rk, k=l,...,K. 



The matrices Wjt can be precomputed to any accuracy at essen- 
tially the same cost as the V^. For particular distributions, the 
formulas can be turned into closed form expressions. 

Next, let us look at L = 2. By using similar notation and argu- 
ments as for choice probabilities, the quantities etA^k'.k^ ^tB.k',k 
have a part before and after T'^^. This, together with (10), (11), 
gives 



= Z' 



etAB\ki = k' = Z'Wk' + J2p"-''^'Qk' {J2'^k'k("Vk + Wk) 

n=0 \k=l 

Wk' + (f2Kk'^(^k^ (^J2'^^'t^k 

\i=0 / \i=l 

Wk' + Ck' dk'kVk'j + Bk' dk'kW^ 
Ck=Y.P''-knQl, k=l,...,K. 



= Z' 



where 



Thus, the counterpart of (8) is 

etAB = Z'((CD)V +{1 + BD)W), 



(12) 



(13) 



From here, combining with (8), a joint recursion for computing 
Pab and etAB results: 



[PAB, etAB] = [Z', Z'] 



I + BD(i) 0 
CD") I-FBD(i' 



I-FBD<^-i' 0 



V 

w 



(14) 



We conclude this section with a few remarks. In Diederich (1997), 
under the name MADD/pp, a slightly different presentation of 
random schedules is given for the special case of geometrically 
distributed attention times. It is not hard to see, that (with the 
notation r,-, used in the K = 3 example presented in Section 
4.2 Diederich, 1997) our model is equivalent to MADD/pp as 
L oo, if we set rk = I — rkk for the parameters r of the 
geometrically distributed Tat, k = 1,2, 3, and dkk = 0, dkk' = 
rkk'/{i — rkk), / k, for the entries of the matrix D = D^'', 
Z > 1 . The advantage of the MADD/pp model is that it provides 
closed form formulas for the case L = oo, a possibility that we did 
not pursue here for other types of attention time distributions. 

In previous sequential decision models with finite L 
(Diederich, 1997), the last attribute was always considered 
infinitely long (infinite decision horizon) to avoid the situation 
of no decision, i. e., po > 0. This can be incorporated into the 
current model by modifying the definition of the matrices Vk, Wk 
corresponding to the last interval [fi - i , oo) to 

Vk = {I - Qkr'Rk, Wk = {I - Qkr^Rk, k=l,...,K, 

and modifying the recursion (14) slightly. Alternatively, one can 
artificially change the parameters of the attention time distribu- 
tion for Z = L such that its expected value is sufficiently large, and 
make po practically negligible. Since infinite decision horizons do 
not seem to adequately reflect the situation of a real decision pro- 
cess or laboratory experiment, it might be interesting to work 
under scenarios where ti is fixed and finite that we described in 
this paper. 
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4. SIMULATIONS 

We present some simulations that demonstrate the predictive 
power of the proposed model. We focus on features that have 
not been considered in Diederich (1997) for the deterministic 
case. Throughout this section we fix certain parameters, such as 
cr = 1, = —6b = 10, a = j, r = -j^ (this implies a state space 
size ofm = 81), and always start at the neutral position Jf(0) = 0 
between choice alternatives A and B. 

4.1. IMPACT OF AHENTION TIME DISTRIBUTIONS 

First, we show how different assumptions on the random- 
ness of the attention time Tat (i-e-) the time spent on con- 
sidering a certain attribute) influences choice probabilities and 
mean response times. In the first example, we assume just two 
attributes with parameters Si = 0.2, yi = 0.03, Si = 0.04, = 
0.003, both attributes favor alternative A, the first one more 
strongly than the second one^. The attributes are considered only 
once (I = 2), with order ki = I, ki = 2. The first attribute is 
considered for time fi = rni, where «i is a random variable 
Tat described above with given expectation N. For the second 



^Note that when looking only at the numerical values of the drift parameter 
i5i = 0.2 and the decision criterion 0a = 10 and assuming that the attention 
times ti to the first attribute are large enough it would suggest mean response 
times in the range Ta ~ 50 (and very small p^). However, since yi = 0.03 it 
leads to a negative effective drift Si — YiX{t) if X(t) comes close 9a, and the 
mean response times become much longer. This also demonstrates the effect 
of the parameter yk, and a difference between Ornstein-Uhlenbeck process 
and Wiener process based models. 




Unif.2 



C 



CD 
Q. 



FIGURE 4 I Choice probabilities (A,C) and mean response times (B,D) as 
functions of the expected attention time E{ti) = 10. . . 500 paid to the 
first attribute for different distribution types. The attribute considered first 
for a random time fi strongly favors alternative A, followed by a second 



attribute we compare two situations: (1) We assume an infinitely 
long decision horizon ti = oo, and (2) we determine a finite 
time horizon f2 = t«2 by choosing ni = ni + Tat which is also 
Tat distributed with the same expected value N. These two sit- 
uations are depicted in Figures 4, 5. The graphs show choice 
probabilities and mean response times as functions of the expec- 
tation rE{Tat) of the real attention times. Lines of different color 
represent different distributions. Distributions with a small vari- 
ance, such as the Poisson distribution, the binomial distribution, 
and the uniform distribution with M ~ \/N produce results 
indistinguishable from the deterministic case. This holds for all 
tested situations shown below. This means, small uncertainties 
in attention time spans do not influence the observable choice 
frequencies and mean response times. However, as the variance 
of the attention times grows, we see quantitative and qualitative 
changes. Compared to the deterministic attention time situation, 
the geometric distribution differs most, and the uniform distri- 
butions with M = N/2 = 150 (Unif 1) and M = N - 1 = 299 
(Unif 2) are intermediate. Moreover, there is expectedly a big dif- 
ference for small mean attention times between finite and infinite 
decision horizons. Most importantly, for the former case it pre- 
dicts a probability po > 0 of not deciding within the available time 
t2. We claim that for many situations, where an infinite time hori- 
zon does not represent reality well enough, our finite schedule 
model might be more appealing. This aspect will be pursued in 
further research. 

Figures 6, 7 show similar simulation results for the situation 
of considering first an attribute favoring B (Si = —0.1, yi = 0) 



B 100 




attribute which only weakly favors A but is considered indefinitely. Note that 
graphs for distribution types with small variance are almost indistinguishable 
from the graph corresponding to deterministically fixed fi (variance 0) and 
therefore are omitted here. 
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FIGURE 5 I Same as in Figure 4 but now the second attribute is 
also considered for a random finite time t2 — ti whose distribution 
is the same as for ti [in particular, £(t2 — ti) = £(ti)]. (A) and (B) 

show the choice probabilities for choosing alternative A and B, 
respectively. (C) shows the probability pa of not reaching a decision 



which naturally decays if the expected attribute attention time grows. 
(D) and (E) show the expected mean response times for choosing 
alternative A and B, respectively, as functions of the expected attention 
time E{ti ) = 10 . . . 500 paid to the first attribute for different distribution 
types. 



followed by an attribute more Strongly favoring A (i52 = 0.2, yi = 

0. 03). As expected, the results look now different, however, the 
main conclusions from the previous example concerning the 
influence of the randomness type for attention times and the dif- 
ferences for finite vs. infinite time horizons remain the same. Most 
importantly here, the model predicts a preference reversal (i.e., 
choice probabilities from below 0.5 to above 0.5) as a function of 
attention time when one attribute is in favor of choosing alterna- 
tive A and the other in favor of choosing alternative B. Parameter 
studies, as in Diederich (1997), wOlbe pursued further elsewhere. 

To complete the picture, we show a three-attribute example 
{K = 3) in Figures. The chosen attribute parameters are now 
Si = 0.04, Yi = 0.003, 82 = -0.1, /2 = 0, S3 = 0.2, yj = 0.03, 

1. e., a weakly in favor of A, in favor of B, and strongly in favor 
of A sequence of attributes. Attention times for the first two 
attributes are chosen independently from each other but with 
the same distribution with fixed mean value; the last attribute is 
considered indefinitely. 

4.2. DEPENDENCE ON ATTRIBUTE ORDER 

The proposed sequential decision model is sensitive to the order 
in which the attributes are consider. If we consider in the afore- 
mentioned second two-attribute example the attribute in favor of 
A first, and then the attribute in favor of B we get very different 
patterns as shown in Figure 9 compared to Figure 6. A similar 
effect is true for the above K = 3 example. In Figure 10, the 



attribute in favor of B is now the last one; the graphs need 
to be compared with Figure 8. One interesting pattern can be 
observed. If the evidence for choosing one alternative decreases 
in the sequence of attribute consideration then the model pre- 
dicts faster choice response times for the more frequently chosen 
alternative — a typical pattern observed in response time analy- 
sis. However, if the evidence increases in the sequence of attribute 
consideration then the model predicts faster choice response 
times for the less frequently chosen alternative which has been 
called /flsf error, as shown in Figure 11 compared to Figure 4. 
Simply by changing the order of attribute processing the model 
predicts a complex pattern of choice response times and choice 
probabilities. 

So far, all examples shown are with a fixed, determinis- 
tic attribute order with no repetitions (semi-random schedule, 
L = K). The evaluation of fully random time and order schedules 
requires larger L, and wiU be presented elsewhere. 

5. CONCLUDING REMARKS 

The proposed multiattribute attention switching (MAAS) model 
can predict a very complex choice probability/(mean) choice 
response time pattern. It may appear too flexible to be testable. 
However, this is not the case. If two attributes both favor alter- 
native, A say, and the first attribute that is considered provides 
more evidence for choosing A than the second {Si > Si), then 
the model predicts always shorter response times for the more 
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FIGURE 6 I Choice probabilities (A,C) and mean response times (B,D) 
for a decision situation where an attribute favoring alternative B is 
considered first for a random time ti, followed by a second attribute 
strongly favoring A but considered indefinitely. We show graphs of 



choice probabilities and mean response times as functions of the expected 
attention time E(fi) = 10... 500 paid to the first attribute for different 
distribution types. Again, graphs for distribution types with small variance 
are indistinguishable from each other. 
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FIGURE 7 I Same as in Figure 6 but now the second attribute is 
also considered for a random finite time — ti whose distribution 
is the same as for fq. (A), (B), and (C) show the choice probabilities 



for choosing alternatives A, B and none, respectively. (D) and (E) 
show the mean response times for choosing alternatives A and B, 
respectively. 
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500 



FIGURE 8 I Choice probabilities (A,C) and mean response times (B,D) for 
a decision model with three attributes. An attribute weakly favoring 
alternative A is considered first for a random time ti , followed by a second 
attribute favoring B considered for a random time f2 — fi , while the last 
attribute (strongly favoring Ki is considered indefinitely. The random attention 



500 



times fi and t-i — fi for the first two attributes are independently chosen from 
the same distribution. We show graphs of choice probabilities and mean 
response times as functions of the expected attention time 
E(fi ) = £(t2 — ti ) = 10. . . 500 for different distribution types. Again, small 
variance distributions yield almost identical results. 
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FIGURE 9 I Same as in Figure 6 but with a different attribute order: First 
the attribute strongly in favor of A is considered for a finite random time 
ti, then the attribute favoring 6 is considered indefinitely long. (A) and 



(C) show the choice probabilities for choosing alternatives A and B 
respectively. (B) and (D) show the mean response times for choosing 
alternatives A and B, respectively. 



frequently chosen alternative, here A, regardless of the assumed 
underlying attention time distribution. If the order of processing 
these attributes is reversed, i.e., the attribute that favors alternative 
A less is considered first (52 > c5i), then the model always 



predicts faster responses for the less frequently chosen alterna- 
tive, here B, again regardless of the assumed underlying attention 
time distribution. A single stage process can only account for 
this pattern by assuming variability in starting positions and 
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FIGURE 10 I Same as in Figure 8 but with a different attribute order: First 
tlie two attributes in favor of A (strong followed by weak) are 
considered for finite random periods of time, then the attribute favoring 



B is considered indefinitely long. (A) and (C) show the choice probabilities 
for choosing alternatives A and B, respectively. (B) and (D) show the mean 
response times for choosing alternatives A and B, respectively. 
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FIGURE 11 I Same as in Figure 4 but with a different attribute order: The 
attribute considered first for a random time ti weakly favors alternative 
A, followed by a second attribute which strongly favors A but is 



considered indefinitely. (A) and (C) show the choice probabilities for 
choosing alternatives A and B respectively. (B) and (D) show the mean 
response times for choosing alternatives A and B, respectively. 
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variability in drift rates, i.e., a statistical means where the drift 
rate itself is a random variable. It is difiicult experimentally to 
disentangle the variability stemming from the stochastic process 
itself and the variability from the distribution of different drift 
rates. As Jones and Dzhafarov (2013) pointed out, the predictions 
of various sequential sampling models rest upon the assump- 
tions made about the assumed probability distributions. This is 
not the case here. The model is falsifiable without assuming spe- 
cific distributions. Rather than relying on statistical mechanisms 
to ensure an observed response patterns we rely on assump- 
tions about cognitive processes such as attention switching and 
salience. The specific attention time distribution used for an 
application may be related to the experimental paradigm. For 
instance, when tracking eye movements, the sequence of attribute 
consideration and the switching times are directly observable, and 
a deterministic or a uniform distribution with a small variance 
is advisable. When all attributes are shown simultaneously, like 
in complex objects, and attention may shift at any moment in 
time a geometric distribution or a uniform distribution with a 
large variance may describe the situation better. Testing the model 
rigorously will be pursued in the future. 
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